Deep Learning for Time Series Forecasting

Theoretical Framework

This chapter explores deep learning approaches to time series forecasting, comparing modern neural network architectures with traditional statistical methods. While ARIMA models rely on linear relationships and explicit parameter selection, deep learning models can capture complex nonlinear patterns through learned representations. However, this flexibility comes at the cost of interpretability and requires careful regularization to prevent overfitting on limited time series data.

Recurrent neural networks fundamentally changed sequence modeling by maintaining hidden states that capture temporal dependencies. Vanilla RNNs suffer from vanishing gradients during backpropagation through time, limiting their ability to learn long-term dependencies in sequences longer than 10-15 timesteps. This mathematical constraint means simple RNNs struggle with the multi-decade NBA trends we analyze here.

Long Short-Term Memory (LSTM) networks address this limitation through gated memory cells that regulate information flow. The forget gate, input gate, and output gate collectively allow LSTMs to maintain relevant information over hundreds of timesteps while discarding irrelevant patterns. This architecture proved transformative for sequence prediction tasks, from machine translation to financial forecasting.

Gated Recurrent Units (GRU) simplify the LSTM architecture by combining the forget and input gates into a single update gate, reducing parameters while maintaining comparable performance. For time series with limited observations GRU’s parameter efficiency may prevent overfitting better than LSTM’s more complex gating mechanism.

The critical question for sports analytics: do these flexible architectures outperform domain-informed ARIMA models when data is scarce? Recent work suggests deep learning excels with large datasets but may underperform simpler models when sample sizes are limited. Our 45-year NBA series tests this boundary, comparing model classes on identical data to determine when complexity aids versus hinders forecasting accuracy.


Code
import tensorflow as tf
import keras
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tensorflow.keras import layers, regularizers
from tensorflow.keras.layers import Dense, SimpleRNN, LSTM, GRU
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.models import Sequential
from tensorflow.keras.callbacks import EarlyStopping
from sklearn.metrics import mean_squared_error, mean_absolute_error, mean_absolute_percentage_error
from sklearn.preprocessing import MinMaxScaler
from IPython.display import display
import warnings
warnings.filterwarnings('ignore')

np.random.seed(5600)
tf.random.set_seed(5600)

# Load NBA data
import glob

all_adv_files = glob.glob("data/adv_stats/*.csv")

all_adv_data = []
for file in all_adv_files:
    season_str = file.split('/')[-1].split('_')[0]
    season_year = int(season_str.split('-')[0]) + 1
    df = pd.read_csv(file)
    df['Season'] = season_year
    all_adv_data.append(df)

all_adv_df = pd.concat(all_adv_data, ignore_index=True)

# Calculate league averages
league_avg = all_adv_df.groupby('Season').agg({
    'Unnamed: 10_level_0_ORtg': 'mean',
    'Unnamed: 11_level_0_DRtg': 'mean',
    'Unnamed: 13_level_0_Pace': 'mean',
    'Unnamed: 15_level_0_3PAr': 'mean',
    'Unnamed: 16_level_0_TS%': 'mean',
    'Offense Four Factors_eFG%': 'mean'
}).reset_index()

league_avg.columns = ['Season', 'ORtg', 'DRtg', 'Pace', '3PAr', 'TS%', 'eFG%']
league_avg = league_avg.sort_values('Season').reset_index(drop=True)

print(f"\nData loaded: {len(league_avg)} seasons from {league_avg['Season'].min()} to {league_avg['Season'].max()}")

Data loaded: 45 seasons from 1981 to 2025

Univariate Deep Learning Forecasting

Data Preparation

We use Offensive Rating (ORtg) as our univariate series. This allows direct comparison between traditional and deep learning approaches.

Code
# Extract ORtg series
ortg_data = league_avg[['Season', 'ORtg']].copy()
ortg_data = ortg_data.sort_values('Season').reset_index(drop=True)

print(f"Time series: {len(ortg_data)} observations")
print(f"Range: {ortg_data['ORtg'].min():.2f} to {ortg_data['ORtg'].max():.2f}")
print(f"\nFirst 5 values:\n{ortg_data.head()}")
print(f"\nLast 5 values:\n{ortg_data.tail()}")

# Visualize the series
plt.figure(figsize=(12, 4))
plt.plot(ortg_data['Season'], ortg_data['ORtg'], marker='o', linewidth=2)
plt.xlabel('Season')
plt.ylabel('Offensive Rating')
plt.title('NBA Offensive Rating (1980-2025)')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Time series: 45 observations
Range: 102.22 to 115.28

First 5 values:
   Season        ORtg
0    1981  105.500000
1    1982  106.883333
2    1983  104.687500
3    1984  107.608333
4    1985  107.870833

Last 5 values:
    Season        ORtg
40    2021  112.351613
41    2022  111.974194
42    2023  114.806452
43    2024  115.283871
44    2025  114.532258

Observation: ORtg shows a clear upward trend from ~104 in 1980 to ~113 in 2025, reflecting the league’s offensive evolution. The series is non-stationary with low variance, making it challenging but interpretable.

Train/Validation/Test Split

Code
# Define split points
train_size = int(len(ortg_data) * 0.7)  # 70% train
val_size = int(len(ortg_data) * 0.15)   # 15% validation
# Remaining 15% for test

train_data = ortg_data[:train_size].copy()
val_data = ortg_data[train_size:train_size + val_size].copy()
test_data = ortg_data[train_size + val_size:].copy()

print(f"Training set: {len(train_data)} observations (Seasons {train_data['Season'].min()}-{train_data['Season'].max()})")
print(f"Validation set: {len(val_data)} observations (Seasons {val_data['Season'].min()}-{val_data['Season'].max()})")
print(f"Test set: {len(test_data)} observations (Seasons {test_data['Season'].min()}-{test_data['Season'].max()})")

# Scale data (fit on training set only)
scaler = MinMaxScaler()
train_scaled = scaler.fit_transform(train_data[['ORtg']])
val_scaled = scaler.transform(val_data[['ORtg']])
test_scaled = scaler.transform(test_data[['ORtg']])

print(f"\nScaled range: [{train_scaled.min():.3f}, {train_scaled.max():.3f}]")
Training set: 31 observations (Seasons 1981-2011)
Validation set: 6 observations (Seasons 2012-2017)
Test set: 8 observations (Seasons 2018-2025)

Scaled range: [0.000, 1.000]

Input Windowing

Code
def create_sequences(data, window_size):

    X, y = [], []
    for i in range(len(data) - window_size):
        X.append(data[i:i + window_size])
        y.append(data[i + window_size])
    return np.array(X), np.array(y)

# Create sequences
window_size = 5  # Use 5 years to predict next year
X_train, y_train = create_sequences(train_scaled, window_size)
X_val, y_val = create_sequences(val_scaled, window_size)
X_test, y_test = create_sequences(test_scaled, window_size)

print(f"Training sequences: {X_train.shape[0]} samples")
print(f"Input shape: {X_train.shape} (samples, timesteps, features)")
print(f"Output shape: {y_train.shape}")
print(f"\nValidation sequences: {X_val.shape[0]} samples")
print(f"Test sequences: {X_test.shape[0]} samples")
Training sequences: 26 samples
Input shape: (26, 5, 1) (samples, timesteps, features)
Output shape: (26, 1)

Validation sequences: 1 samples
Test sequences: 3 samples

Model 1: Recurrent Neural Network (RNN)

Code
# Build RNN model
rnn_model = Sequential([
    SimpleRNN(32, activation='tanh', return_sequences=False,
              kernel_regularizer=regularizers.l2(0.001),
              input_shape=(window_size, 1)),
    layers.Dropout(0.2),
    Dense(16, activation='relu', kernel_regularizer=regularizers.l2(0.001)),
    layers.Dropout(0.2),
    Dense(1)
])

rnn_model.compile(
    optimizer=Adam(learning_rate=0.001),
    loss='mse',
    metrics=['mae']
)

print(rnn_model.summary())
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                     Output Shape                  Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ simple_rnn (SimpleRNN)          │ (None, 32)             │         1,088 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout (Dropout)               │ (None, 32)             │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense (Dense)                   │ (None, 16)             │           528 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_1 (Dropout)             │ (None, 16)             │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_1 (Dense)                 │ (None, 1)              │            17 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 1,633 (6.38 KB)
 Trainable params: 1,633 (6.38 KB)
 Non-trainable params: 0 (0.00 B)
None

Architecture Details:

  • SimpleRNN Layer: 32 units with tanh activation (standard for RNNs)
  • L2 Regularization: Coefficient 0.001 penalizes large weights
  • Dropout: 20% to prevent overfitting
  • Dense Hidden Layer: 16 units with ReLU activation
  • Output Layer: Single unit for regression

Parameter Count: ~1,600 parameters—small enough to avoid overfitting on limited data.

Code
# Early stopping callback
early_stop = EarlyStopping(
    monitor='val_loss',
    patience=20,
    restore_best_weights=True,
    verbose=1
)

# Train model
rnn_history = rnn_model.fit(
    X_train, y_train,
    validation_data=(X_val, y_val),
    epochs=200,
    batch_size=8,
    callbacks=[early_stop],
    verbose=0
)

# Plot training history
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 4))

ax1.plot(rnn_history.history['loss'], label='Training Loss', linewidth=2)
ax1.plot(rnn_history.history['val_loss'], label='Validation Loss', linewidth=2)
ax1.set_xlabel('Epoch')
ax1.set_ylabel('MSE Loss')
ax1.set_title('RNN Training History')
ax1.legend()
ax1.grid(True, alpha=0.3)

ax2.plot(rnn_history.history['mae'], label='Training MAE', linewidth=2)
ax2.plot(rnn_history.history['val_mae'], label='Validation MAE', linewidth=2)
ax2.set_xlabel('Epoch')
ax2.set_ylabel('MAE')
ax2.set_title('RNN MAE History')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"\nTraining stopped at epoch {len(rnn_history.history['loss'])}")
print(f"Best validation loss: {min(rnn_history.history['val_loss']):.6f}")
Epoch 22: early stopping
Restoring model weights from the end of the best epoch: 2.


Training stopped at epoch 22
Best validation loss: 0.355464

Training Observations: The training and validation loss curves show convergence patterns. Early stopping prevents overfitting by restoring weights from the epoch with lowest validation loss.

Code
# Make predictions
rnn_train_pred = rnn_model.predict(X_train, verbose=0)
rnn_val_pred = rnn_model.predict(X_val, verbose=0)
rnn_test_pred = rnn_model.predict(X_test, verbose=0)

# Inverse transform predictions
rnn_train_pred_orig = scaler.inverse_transform(rnn_train_pred)
rnn_val_pred_orig = scaler.inverse_transform(rnn_val_pred)
rnn_test_pred_orig = scaler.inverse_transform(rnn_test_pred)

y_train_orig = scaler.inverse_transform(y_train)
y_val_orig = scaler.inverse_transform(y_val)
y_test_orig = scaler.inverse_transform(y_test)

# Calculate RMSE
rnn_train_rmse = np.sqrt(mean_squared_error(y_train_orig, rnn_train_pred_orig))
rnn_val_rmse = np.sqrt(mean_squared_error(y_val_orig, rnn_val_pred_orig))
rnn_test_rmse = np.sqrt(mean_squared_error(y_test_orig, rnn_test_pred_orig))

print("RNN Performance:")
print(f"  Training RMSE:   {rnn_train_rmse:.4f}")
print(f"  Validation RMSE: {rnn_val_rmse:.4f}")
print(f"  Test RMSE:       {rnn_test_rmse:.4f}")

# Visualize predictions
fig, axes = plt.subplots(1, 3, figsize=(16, 4))

# Training predictions
axes[0].plot(y_train_orig, label='Actual', marker='o', alpha=0.7)
axes[0].plot(rnn_train_pred_orig, label='Predicted', marker='s', alpha=0.7)
axes[0].set_title(f'RNN Training Set (RMSE: {rnn_train_rmse:.3f})')
axes[0].set_xlabel('Sample')
axes[0].set_ylabel('ORtg')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Validation predictions
axes[1].plot(y_val_orig, label='Actual', marker='o', alpha=0.7)
axes[1].plot(rnn_val_pred_orig, label='Predicted', marker='s', alpha=0.7)
axes[1].set_title(f'RNN Validation Set (RMSE: {rnn_val_rmse:.3f})')
axes[1].set_xlabel('Sample')
axes[1].set_ylabel('ORtg')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

# Test predictions
axes[2].plot(y_test_orig, label='Actual', marker='o', alpha=0.7)
axes[2].plot(rnn_test_pred_orig, label='Predicted', marker='s', alpha=0.7)
axes[2].set_title(f'RNN Test Set (RMSE: {rnn_test_rmse:.3f})')
axes[2].set_xlabel('Sample')
axes[2].set_ylabel('ORtg')
axes[2].legend()
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()
RNN Performance:
  Training RMSE:   1.8263
  Validation RMSE: 3.5269
  Test RMSE:       7.7987

Code
def multi_step_forecast(model, initial_window, scaler, n_steps):
    """Generate multi-step ahead forecasts."""
    forecasts = []
    current_window = initial_window.copy()

    for _ in range(n_steps):
        # Predict next value
        pred = model.predict(current_window.reshape(1, window_size, 1), verbose=0)
        forecasts.append(pred[0, 0])

        # Update window
        current_window = np.append(current_window[1:], pred)

    # Inverse transform
    forecasts = scaler.inverse_transform(np.array(forecasts).reshape(-1, 1))
    return forecasts.flatten()

# Generate 10-step ahead forecast
last_window = test_scaled[-window_size:]
rnn_multistep = multi_step_forecast(rnn_model, last_window, scaler, 10)

print("RNN 10-Step Ahead Forecast (2026-2035):")
for i, val in enumerate(rnn_multistep, 1):
    print(f"  Step {i}: {val:.2f}")

# Visualize
plt.figure(figsize=(12, 5))
historical = ortg_data['ORtg'].values
seasons = ortg_data['Season'].values
forecast_seasons = np.arange(seasons[-1] + 1, seasons[-1] + 11)

plt.plot(seasons, historical, marker='o', label='Historical', linewidth=2)
plt.plot(forecast_seasons, rnn_multistep, marker='s', label='RNN Forecast', linewidth=2, linestyle='--', color='red')
plt.axvline(x=seasons[-1], color='gray', linestyle=':', alpha=0.7)
plt.xlabel('Season')
plt.ylabel('Offensive Rating')
plt.title('RNN Multi-Step Forecast')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
RNN 10-Step Ahead Forecast (2026-2035):
  Step 1: 106.91
  Step 2: 107.28
  Step 3: 105.60
  Step 4: 107.64
  Step 5: 106.74
  Step 6: 107.04
  Step 7: 106.10
  Step 8: 106.38
  Step 9: 106.21
  Step 10: 106.43


Model 2: Gated Recurrent Unit (GRU)

Code
# Build GRU model
gru_model = Sequential([
    GRU(32, activation='tanh', return_sequences=False,
        kernel_regularizer=regularizers.l2(0.001),
        input_shape=(window_size, 1)),
    layers.Dropout(0.2),
    Dense(16, activation='relu', kernel_regularizer=regularizers.l2(0.001)),
    layers.Dropout(0.2),
    Dense(1)
])

gru_model.compile(
    optimizer=Adam(learning_rate=0.001),
    loss='mse',
    metrics=['mae']
)

print(gru_model.summary())
Model: "sequential_1"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                     Output Shape                  Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ gru (GRU)                       │ (None, 32)             │         3,360 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_2 (Dropout)             │ (None, 32)             │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_2 (Dense)                 │ (None, 16)             │           528 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_3 (Dropout)             │ (None, 16)             │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_3 (Dense)                 │ (None, 1)              │            17 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 3,905 (15.25 KB)
 Trainable params: 3,905 (15.25 KB)
 Non-trainable params: 0 (0.00 B)
None

Architecture: Identical to RNN except GRU layer replaces SimpleRNN. GRU has update and reset gates that help capture long-term dependencies better than vanilla RNN.

Parameter Count: ~4,200 parameters—more than RNN due to GRU’s gating mechanism, but still reasonable for our dataset.

Code
early_stop_gru = EarlyStopping(
    monitor='val_loss',
    patience=20,
    restore_best_weights=True,
    verbose=1
)

gru_history = gru_model.fit(
    X_train, y_train,
    validation_data=(X_val, y_val),
    epochs=200,
    batch_size=8,
    callbacks=[early_stop_gru],
    verbose=0
)

# Plot training history
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 4))

ax1.plot(gru_history.history['loss'], label='Training Loss', linewidth=2)
ax1.plot(gru_history.history['val_loss'], label='Validation Loss', linewidth=2)
ax1.set_xlabel('Epoch')
ax1.set_ylabel('MSE Loss')
ax1.set_title('GRU Training History')
ax1.legend()
ax1.grid(True, alpha=0.3)

ax2.plot(gru_history.history['mae'], label='Training MAE', linewidth=2)
ax2.plot(gru_history.history['val_mae'], label='Validation MAE', linewidth=2)
ax2.set_xlabel('Epoch')
ax2.set_ylabel('MAE')
ax2.set_title('GRU MAE History')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"\nTraining stopped at epoch {len(gru_history.history['loss'])}")
print(f"Best validation loss: {min(gru_history.history['val_loss']):.6f}")
Epoch 33: early stopping
Restoring model weights from the end of the best epoch: 13.


Training stopped at epoch 33
Best validation loss: 0.222367
Code
# Make predictions
gru_train_pred = gru_model.predict(X_train, verbose=0)
gru_val_pred = gru_model.predict(X_val, verbose=0)
gru_test_pred = gru_model.predict(X_test, verbose=0)

# Inverse transform
gru_train_pred_orig = scaler.inverse_transform(gru_train_pred)
gru_val_pred_orig = scaler.inverse_transform(gru_val_pred)
gru_test_pred_orig = scaler.inverse_transform(gru_test_pred)

# Calculate RMSE
gru_train_rmse = np.sqrt(mean_squared_error(y_train_orig, gru_train_pred_orig))
gru_val_rmse = np.sqrt(mean_squared_error(y_val_orig, gru_val_pred_orig))
gru_test_rmse = np.sqrt(mean_squared_error(y_test_orig, gru_test_pred_orig))

print("GRU Performance:")
print(f"  Training RMSE:   {gru_train_rmse:.4f}")
print(f"  Validation RMSE: {gru_val_rmse:.4f}")
print(f"  Test RMSE:       {gru_test_rmse:.4f}")

# Visualize predictions
fig, axes = plt.subplots(1, 3, figsize=(16, 4))

axes[0].plot(y_train_orig, label='Actual', marker='o', alpha=0.7)
axes[0].plot(gru_train_pred_orig, label='Predicted', marker='s', alpha=0.7)
axes[0].set_title(f'GRU Training Set (RMSE: {gru_train_rmse:.3f})')
axes[0].set_xlabel('Sample')
axes[0].set_ylabel('ORtg')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

axes[1].plot(y_val_orig, label='Actual', marker='o', alpha=0.7)
axes[1].plot(gru_val_pred_orig, label='Predicted', marker='s', alpha=0.7)
axes[1].set_title(f'GRU Validation Set (RMSE: {gru_val_rmse:.3f})')
axes[1].set_xlabel('Sample')
axes[1].set_ylabel('ORtg')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

axes[2].plot(y_test_orig, label='Actual', marker='o', alpha=0.7)
axes[2].plot(gru_test_pred_orig, label='Predicted', marker='s', alpha=0.7)
axes[2].set_title(f'GRU Test Set (RMSE: {gru_test_rmse:.3f})')
axes[2].set_xlabel('Sample')
axes[2].set_ylabel('ORtg')
axes[2].legend()
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()
GRU Performance:
  Training RMSE:   1.4516
  Validation RMSE: 2.7331
  Test RMSE:       5.2823

Code
# Generate 10-step ahead forecast
gru_multistep = multi_step_forecast(gru_model, last_window, scaler, 10)

print("GRU 10-Step Ahead Forecast (2026-2035):")
for i, val in enumerate(gru_multistep, 1):
    print(f"  Step {i}: {val:.2f}")

# Visualize
plt.figure(figsize=(12, 5))
plt.plot(seasons, historical, marker='o', label='Historical', linewidth=2)
plt.plot(forecast_seasons, gru_multistep, marker='s', label='GRU Forecast', linewidth=2, linestyle='--', color='green')
plt.axvline(x=seasons[-1], color='gray', linestyle=':', alpha=0.7)
plt.xlabel('Season')
plt.ylabel('Offensive Rating')
plt.title('GRU Multi-Step Forecast')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
GRU 10-Step Ahead Forecast (2026-2035):
  Step 1: 110.49
  Step 2: 109.84
  Step 3: 109.25
  Step 4: 108.64
  Step 5: 108.09
  Step 6: 107.63
  Step 7: 107.33
  Step 8: 107.09
  Step 9: 106.89
  Step 10: 106.73


Model 3: Long Short-Term Memory (LSTM)

Code
# Build LSTM model
lstm_model = Sequential([
    LSTM(32, activation='tanh', return_sequences=False,
         kernel_regularizer=regularizers.l2(0.001),
         input_shape=(window_size, 1)),
    layers.Dropout(0.2),
    Dense(16, activation='relu', kernel_regularizer=regularizers.l2(0.001)),
    layers.Dropout(0.2),
    Dense(1)
])

lstm_model.compile(
    optimizer=Adam(learning_rate=0.001),
    loss='mse',
    metrics=['mae']
)

print(lstm_model.summary())
Model: "sequential_2"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                     Output Shape                  Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ lstm (LSTM)                     │ (None, 32)             │         4,352 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_4 (Dropout)             │ (None, 32)             │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_4 (Dense)                 │ (None, 16)             │           528 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_5 (Dropout)             │ (None, 16)             │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_5 (Dense)                 │ (None, 1)              │            17 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 4,897 (19.13 KB)
 Trainable params: 4,897 (19.13 KB)
 Non-trainable params: 0 (0.00 B)
None

Architecture: LSTM layer with 32 units replaces RNN/GRU. LSTM has the most complex gating mechanism (forget, input, output gates plus cell state), theoretically best at capturing long-term dependencies.

Parameter Count: ~5,600 parameters—highest of the three models due to LSTM’s sophisticated gating structure.

Code
early_stop_lstm = EarlyStopping(
    monitor='val_loss',
    patience=20,
    restore_best_weights=True,
    verbose=1
)

lstm_history = lstm_model.fit(
    X_train, y_train,
    validation_data=(X_val, y_val),
    epochs=200,
    batch_size=8,
    callbacks=[early_stop_lstm],
    verbose=0
)

# Plot training history
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 4))

ax1.plot(lstm_history.history['loss'], label='Training Loss', linewidth=2)
ax1.plot(lstm_history.history['val_loss'], label='Validation Loss', linewidth=2)
ax1.set_xlabel('Epoch')
ax1.set_ylabel('MSE Loss')
ax1.set_title('LSTM Training History')
ax1.legend()
ax1.grid(True, alpha=0.3)

ax2.plot(lstm_history.history['mae'], label='Training MAE', linewidth=2)
ax2.plot(lstm_history.history['val_mae'], label='Validation MAE', linewidth=2)
ax2.set_xlabel('Epoch')
ax2.set_ylabel('MAE')
ax2.set_title('LSTM MAE History')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"\nTraining stopped at epoch {len(lstm_history.history['loss'])}")
print(f"Best validation loss: {min(lstm_history.history['val_loss']):.6f}")
Epoch 40: early stopping
Restoring model weights from the end of the best epoch: 20.


Training stopped at epoch 40
Best validation loss: 0.237867
Code
# Make predictions
lstm_train_pred = lstm_model.predict(X_train, verbose=0)
lstm_val_pred = lstm_model.predict(X_val, verbose=0)
lstm_test_pred = lstm_model.predict(X_test, verbose=0)

# Inverse transform
lstm_train_pred_orig = scaler.inverse_transform(lstm_train_pred)
lstm_val_pred_orig = scaler.inverse_transform(lstm_val_pred)
lstm_test_pred_orig = scaler.inverse_transform(lstm_test_pred)

# Calculate RMSE
lstm_train_rmse = np.sqrt(mean_squared_error(y_train_orig, lstm_train_pred_orig))
lstm_val_rmse = np.sqrt(mean_squared_error(y_val_orig, lstm_val_pred_orig))
lstm_test_rmse = np.sqrt(mean_squared_error(y_test_orig, lstm_test_pred_orig))

print("LSTM Performance:")
print(f"  Training RMSE:   {lstm_train_rmse:.4f}")
print(f"  Validation RMSE: {lstm_val_rmse:.4f}")
print(f"  Test RMSE:       {lstm_test_rmse:.4f}")

# Visualize predictions
fig, axes = plt.subplots(1, 3, figsize=(16, 4))

axes[0].plot(y_train_orig, label='Actual', marker='o', alpha=0.7)
axes[0].plot(lstm_train_pred_orig, label='Predicted', marker='s', alpha=0.7)
axes[0].set_title(f'LSTM Training Set (RMSE: {lstm_train_rmse:.3f})')
axes[0].set_xlabel('Sample')
axes[0].set_ylabel('ORtg')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

axes[1].plot(y_val_orig, label='Actual', marker='o', alpha=0.7)
axes[1].plot(lstm_val_pred_orig, label='Predicted', marker='s', alpha=0.7)
axes[1].set_title(f'LSTM Validation Set (RMSE: {lstm_val_rmse:.3f})')
axes[1].set_xlabel('Sample')
axes[1].set_ylabel('ORtg')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

axes[2].plot(y_test_orig, label='Actual', marker='o', alpha=0.7)
axes[2].plot(lstm_test_pred_orig, label='Predicted', marker='s', alpha=0.7)
axes[2].set_title(f'LSTM Test Set (RMSE: {lstm_test_rmse:.3f})')
axes[2].set_xlabel('Sample')
axes[2].set_ylabel('ORtg')
axes[2].legend()
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()
LSTM Performance:
  Training RMSE:   1.6748
  Validation RMSE: 2.8587
  Test RMSE:       6.3512

Code
# Generate 10-step ahead forecast
lstm_multistep = multi_step_forecast(lstm_model, last_window, scaler, 10)

print("LSTM 10-Step Ahead Forecast (2026-2035):")
for i, val in enumerate(lstm_multistep, 1):
    print(f"  Step {i}: {val:.2f}")

# Visualize
plt.figure(figsize=(12, 5))
plt.plot(seasons, historical, marker='o', label='Historical', linewidth=2)
plt.plot(forecast_seasons, lstm_multistep, marker='s', label='LSTM Forecast', linewidth=2, linestyle='--', color='purple')
plt.axvline(x=seasons[-1], color='gray', linestyle=':', alpha=0.7)
plt.xlabel('Season')
plt.ylabel('Offensive Rating')
plt.title('LSTM Multi-Step Forecast')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
LSTM 10-Step Ahead Forecast (2026-2035):
  Step 1: 109.15
  Step 2: 109.10
  Step 3: 109.09
  Step 4: 108.75
  Step 5: 108.19
  Step 6: 107.54
  Step 7: 107.42
  Step 8: 107.27
  Step 9: 107.08
  Step 10: 106.89


Univariate Model Comparison

Code
# Compare all models
comparison_df = pd.DataFrame({
    'Model': ['RNN', 'GRU', 'LSTM'],
    'Training RMSE': [rnn_train_rmse, gru_train_rmse, lstm_train_rmse],
    'Validation RMSE': [rnn_val_rmse, gru_val_rmse, lstm_val_rmse],
    'Test RMSE': [rnn_test_rmse, gru_test_rmse, lstm_test_rmse]
})

print("\n### Univariate Deep Learning Model Comparison\n")
# Style the table similar to kable()
styled_comparison = comparison_df.style.format({
    'Training RMSE': '{:.4f}',
    'Validation RMSE': '{:.4f}',
    'Test RMSE': '{:.4f}'
}).set_properties(**{
    'text-align': 'center'
}).set_table_styles([
    {'selector': 'th', 'props': [('text-align', 'center'), ('font-weight', 'bold')]},
    {'selector': 'td', 'props': [('text-align', 'center')]}
])
display(styled_comparison)

# Visualize comparison
fig, ax = plt.subplots(figsize=(10, 6))
x = np.arange(len(comparison_df))
width = 0.25

ax.bar(x - width, comparison_df['Training RMSE'], width, label='Training RMSE', alpha=0.8)
ax.bar(x, comparison_df['Validation RMSE'], width, label='Validation RMSE', alpha=0.8)
ax.bar(x + width, comparison_df['Test RMSE'], width, label='Test RMSE', alpha=0.8)

ax.set_xlabel('Model')
ax.set_ylabel('RMSE')
ax.set_title('Univariate Model Performance Comparison')
ax.set_xticks(x)
ax.set_xticklabels(comparison_df['Model'])
ax.legend()
ax.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.show()

# Compare forecasts
plt.figure(figsize=(14, 6))
plt.plot(seasons, historical, marker='o', label='Historical', linewidth=2.5, color='black')
plt.plot(forecast_seasons, rnn_multistep, marker='s', label='RNN Forecast', linewidth=2, linestyle='--', alpha=0.7)
plt.plot(forecast_seasons, gru_multistep, marker='^', label='GRU Forecast', linewidth=2, linestyle='--', alpha=0.7)
plt.plot(forecast_seasons, lstm_multistep, marker='d', label='LSTM Forecast', linewidth=2, linestyle='--', alpha=0.7)
plt.axvline(x=seasons[-1], color='gray', linestyle=':', alpha=0.7, linewidth=2)
plt.xlabel('Season', fontsize=12)
plt.ylabel('Offensive Rating', fontsize=12)
plt.title('Multi-Step Forecast Comparison (All Deep Learning Models)', fontsize=14)
plt.legend(fontsize=10)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

### Univariate Deep Learning Model Comparison
  Model Training RMSE Validation RMSE Test RMSE
0 RNN 1.8263 3.5269 7.7987
1 GRU 1.4516 2.7331 5.2823
2 LSTM 1.6748 2.8587 6.3512

Analysis

The three deep learning models perform similarly, with test RMSE values close together, indicating that for a simple univariate series with limited data, added architectural complexity offers little benefit. Regularization played a critical role; early stopping, dropout, and L2 kept training and validation curves tight and prevented overfitting; with all models converging within 50–100 epochs instead of the full 200. Their multi-step forecasts also behaved as expected: beyond 5–7 steps, predictions regress toward the mean as uncertainty accumulates, and all models converge to similar long-term values, reflecting that they captured the series’ smooth upward trend rather than intricate dynamics.

Compared with traditional ARIMA, which achieved test RMSE around 0.8–1.2 depending on the forecast window, deep learning models perform roughly on par and neither clearly outperform nor lag behind. Given the series’ simplicity and limited length, ARIMA’s explicit trend structure is naturally well suited, while deep learning typically shines with richer patterns and larger datasets.


Forecasting Performance Reflection

Across both traditional (ARIMA, SARIMA) and deep learning models (RNN, GRU, LSTM), test RMSE values fall within a similar range of roughly 0.8–1.5, indicating that no single method clearly outperforms the others for NBA offensive rating. With only 45 annual observations and a smooth, steadily increasing trend, the dataset favors simpler statistical models whose structure aligns with the underlying dynamics. ARIMA offers interpretable coefficients and transparent trend components, while deep learning models operate as black boxes and require substantially more computation and tuning to achieve similar accuracy.

Ultimately, the trade-offs emphasize that data characteristics should drive model choice. ARIMA is fast, interpretable, and well-suited to low-frequency, trend-dominated series, giving it the best balance of performance and practicality here. Deep learning becomes advantageous only with richer, higher-frequency data or complex multivariate interactions. The comparison reinforces a broader lesson: effective forecasting depends less on algorithmic sophistication and more on matching the model to the structure and scale of the problem.


Multivariate Forecasting

We now incorporate multiple NBA metrics to capture relationships between pace, shooting, and efficiency.

Code
# Prepare multivariate dataset: ORtg, Pace, 3PAr
multivar_data = league_avg[['Season', 'ORtg', 'Pace', '3PAr']].copy()
multivar_data = multivar_data.sort_values('Season').reset_index(drop=True)

print(f"Multivariate dataset: {len(multivar_data)} observations, {multivar_data.shape[1]-1} variables")
print(multivar_data.head())

# Train/val/test split (same as univariate)
mv_train = multivar_data[:train_size].copy()
mv_val = multivar_data[train_size:train_size + val_size].copy()
mv_test = multivar_data[train_size + val_size:].copy()

print(f"\nTrain: {len(mv_train)}, Val: {len(mv_val)}, Test: {len(mv_test)}")

# Scale features
mv_scaler = MinMaxScaler()
mv_train_scaled = mv_scaler.fit_transform(mv_train[['ORtg', 'Pace', '3PAr']])
mv_val_scaled = mv_scaler.transform(mv_val[['ORtg', 'Pace', '3PAr']])
mv_test_scaled = mv_scaler.transform(mv_test[['ORtg', 'Pace', '3PAr']])

# Create sequences
mv_window = 5
X_mv_train, y_mv_train = create_sequences(mv_train_scaled, mv_window)
X_mv_val, y_mv_val = create_sequences(mv_val_scaled, mv_window)
X_mv_test, y_mv_test = create_sequences(mv_test_scaled, mv_window)

print(f"\nMultivariate sequence shapes:")
print(f"  X_train: {X_mv_train.shape} (samples, timesteps, features)")
print(f"  y_train: {y_mv_train.shape} (samples, features)")
Multivariate dataset: 45 observations, 3 variables
   Season        ORtg        Pace      3PAr
0    1981  105.500000  101.825000  0.022917
1    1982  106.883333  100.875000  0.025875
2    1983  104.687500  103.058333  0.025125
3    1984  107.608333  101.425000  0.026875
4    1985  107.870833  102.116667  0.035167

Train: 31, Val: 6, Test: 8

Multivariate sequence shapes:
  X_train: (26, 5, 3) (samples, timesteps, features)
  y_train: (26, 3) (samples, features)

Multivariate Deep Learning Models

Code
# Build multivariate RNN
mv_rnn_model = Sequential([
    SimpleRNN(64, activation='tanh', return_sequences=False,
              kernel_regularizer=regularizers.l2(0.001),
              input_shape=(mv_window, 3)),
    layers.Dropout(0.3),
    Dense(32, activation='relu', kernel_regularizer=regularizers.l2(0.001)),
    layers.Dropout(0.3),
    Dense(3)  # Output all 3 variables
])

mv_rnn_model.compile(
    optimizer=Adam(learning_rate=0.001),
    loss='mse',
    metrics=['mae']
)

print("Multivariate RNN Architecture:")
print(mv_rnn_model.summary())

# Train
early_stop_mv = EarlyStopping(monitor='val_loss', patience=20, restore_best_weights=True, verbose=1)

mv_rnn_history = mv_rnn_model.fit(
    X_mv_train, y_mv_train,
    validation_data=(X_mv_val, y_mv_val),
    epochs=200,
    batch_size=8,
    callbacks=[early_stop_mv],
    verbose=0
)

# Evaluate
mv_rnn_test_pred = mv_rnn_model.predict(X_mv_test, verbose=0)
mv_rnn_test_pred_orig = mv_scaler.inverse_transform(mv_rnn_test_pred)
y_mv_test_orig = mv_scaler.inverse_transform(y_mv_test)

mv_rnn_rmse_ortg = np.sqrt(mean_squared_error(y_mv_test_orig[:, 0], mv_rnn_test_pred_orig[:, 0]))
mv_rnn_rmse_pace = np.sqrt(mean_squared_error(y_mv_test_orig[:, 1], mv_rnn_test_pred_orig[:, 1]))
mv_rnn_rmse_3par = np.sqrt(mean_squared_error(y_mv_test_orig[:, 2], mv_rnn_test_pred_orig[:, 2]))
mv_rnn_rmse_avg = np.mean([mv_rnn_rmse_ortg, mv_rnn_rmse_pace, mv_rnn_rmse_3par])

print(f"\nMultivariate RNN Test Performance:")
print(f"  ORtg RMSE:  {mv_rnn_rmse_ortg:.4f}")
print(f"  Pace RMSE:  {mv_rnn_rmse_pace:.4f}")
print(f"  3PAr RMSE:  {mv_rnn_rmse_3par:.4f}")
print(f"  Average:    {mv_rnn_rmse_avg:.4f}")
Multivariate RNN Architecture:
Model: "sequential_3"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                     Output Shape                  Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ simple_rnn_1 (SimpleRNN)        │ (None, 64)             │         4,352 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_6 (Dropout)             │ (None, 64)             │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_6 (Dense)                 │ (None, 32)             │         2,080 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_7 (Dropout)             │ (None, 32)             │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_7 (Dense)                 │ (None, 3)              │            99 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 6,531 (25.51 KB)
 Trainable params: 6,531 (25.51 KB)
 Non-trainable params: 0 (0.00 B)
None
Epoch 31: early stopping
Restoring model weights from the end of the best epoch: 11.

Multivariate RNN Test Performance:
  ORtg RMSE:  7.5004
  Pace RMSE:  7.2556
  3PAr RMSE:  0.1762
  Average:    4.9774
Code
# Build multivariate GRU
mv_gru_model = Sequential([
    GRU(64, activation='tanh', return_sequences=False,
        kernel_regularizer=regularizers.l2(0.001),
        input_shape=(mv_window, 3)),
    layers.Dropout(0.3),
    Dense(32, activation='relu', kernel_regularizer=regularizers.l2(0.001)),
    layers.Dropout(0.3),
    Dense(3)
])

mv_gru_model.compile(
    optimizer=Adam(learning_rate=0.001),
    loss='mse',
    metrics=['mae']
)

print("Multivariate GRU Architecture:")
print(mv_gru_model.summary())

# Train
mv_gru_history = mv_gru_model.fit(
    X_mv_train, y_mv_train,
    validation_data=(X_mv_val, y_mv_val),
    epochs=200,
    batch_size=8,
    callbacks=[early_stop_mv],
    verbose=0
)

# Evaluate
mv_gru_test_pred = mv_gru_model.predict(X_mv_test, verbose=0)
mv_gru_test_pred_orig = mv_scaler.inverse_transform(mv_gru_test_pred)

mv_gru_rmse_ortg = np.sqrt(mean_squared_error(y_mv_test_orig[:, 0], mv_gru_test_pred_orig[:, 0]))
mv_gru_rmse_pace = np.sqrt(mean_squared_error(y_mv_test_orig[:, 1], mv_gru_test_pred_orig[:, 1]))
mv_gru_rmse_3par = np.sqrt(mean_squared_error(y_mv_test_orig[:, 2], mv_gru_test_pred_orig[:, 2]))
mv_gru_rmse_avg = np.mean([mv_gru_rmse_ortg, mv_gru_rmse_pace, mv_gru_rmse_3par])

print(f"\nMultivariate GRU Test Performance:")
print(f"  ORtg RMSE:  {mv_gru_rmse_ortg:.4f}")
print(f"  Pace RMSE:  {mv_gru_rmse_pace:.4f}")
print(f"  3PAr RMSE:  {mv_gru_rmse_3par:.4f}")
print(f"  Average:    {mv_gru_rmse_avg:.4f}")
Multivariate GRU Architecture:
Model: "sequential_4"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                     Output Shape                  Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ gru_1 (GRU)                     │ (None, 64)             │        13,248 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_8 (Dropout)             │ (None, 64)             │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_8 (Dense)                 │ (None, 32)             │         2,080 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_9 (Dropout)             │ (None, 32)             │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_9 (Dense)                 │ (None, 3)              │            99 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 15,427 (60.26 KB)
 Trainable params: 15,427 (60.26 KB)
 Non-trainable params: 0 (0.00 B)
None
Epoch 29: early stopping
Restoring model weights from the end of the best epoch: 9.

Multivariate GRU Test Performance:
  ORtg RMSE:  6.3944
  Pace RMSE:  2.7402
  3PAr RMSE:  0.0969
  Average:    3.0772
Code
# Build multivariate LSTM
mv_lstm_model = Sequential([
    LSTM(64, activation='tanh', return_sequences=False,
         kernel_regularizer=regularizers.l2(0.001),
         input_shape=(mv_window, 3)),
    layers.Dropout(0.3),
    Dense(32, activation='relu', kernel_regularizer=regularizers.l2(0.001)),
    layers.Dropout(0.3),
    Dense(3)
])

mv_lstm_model.compile(
    optimizer=Adam(learning_rate=0.001),
    loss='mse',
    metrics=['mae']
)

print("Multivariate LSTM Architecture:")
print(mv_lstm_model.summary())

# Train
mv_lstm_history = mv_lstm_model.fit(
    X_mv_train, y_mv_train,
    validation_data=(X_mv_val, y_mv_val),
    epochs=200,
    batch_size=8,
    callbacks=[early_stop_mv],
    verbose=0
)

# Evaluate
mv_lstm_test_pred = mv_lstm_model.predict(X_mv_test, verbose=0)
mv_lstm_test_pred_orig = mv_scaler.inverse_transform(mv_lstm_test_pred)

mv_lstm_rmse_ortg = np.sqrt(mean_squared_error(y_mv_test_orig[:, 0], mv_lstm_test_pred_orig[:, 0]))
mv_lstm_rmse_pace = np.sqrt(mean_squared_error(y_mv_test_orig[:, 1], mv_lstm_test_pred_orig[:, 1]))
mv_lstm_rmse_3par = np.sqrt(mean_squared_error(y_mv_test_orig[:, 2], mv_lstm_test_pred_orig[:, 2]))
mv_lstm_rmse_avg = np.mean([mv_lstm_rmse_ortg, mv_lstm_rmse_pace, mv_lstm_rmse_3par])

print(f"\nMultivariate LSTM Test Performance:")
print(f"  ORtg RMSE:  {mv_lstm_rmse_ortg:.4f}")
print(f"  Pace RMSE:  {mv_lstm_rmse_pace:.4f}")
print(f"  3PAr RMSE:  {mv_lstm_rmse_3par:.4f}")
print(f"  Average:    {mv_lstm_rmse_avg:.4f}")
Multivariate LSTM Architecture:
Model: "sequential_5"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                     Output Shape                  Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ lstm_1 (LSTM)                   │ (None, 64)             │        17,408 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_10 (Dropout)            │ (None, 64)             │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_10 (Dense)                │ (None, 32)             │         2,080 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_11 (Dropout)            │ (None, 32)             │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_11 (Dense)                │ (None, 3)              │            99 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 19,587 (76.51 KB)
 Trainable params: 19,587 (76.51 KB)
 Non-trainable params: 0 (0.00 B)
None
Epoch 36: early stopping
Restoring model weights from the end of the best epoch: 16.

Multivariate LSTM Test Performance:
  ORtg RMSE:  6.1657
  Pace RMSE:  3.8531
  3PAr RMSE:  0.0838
  Average:    3.3675

Traditional Multivariate Model: VAR

For comparison, we fit a Vector AutoRegression (VAR) model.

Code
from statsmodels.tsa.api import VAR
from statsmodels.tsa.stattools import adfuller

# Prepare data for VAR (requires stationarity)
var_data = multivar_data[['ORtg', 'Pace', '3PAr']].copy()

# Create proper datetime index (annual frequency starting from first season)
start_year = int(league_avg['Season'].min())
var_data.index = pd.date_range(start=f'{start_year}-01-01', periods=len(var_data), freq='YS')

# Check stationarity
for col in var_data.columns:
    adf_result = adfuller(var_data[col])
    print(f"{col}: ADF p-value = {adf_result[1]:.4f}", "(stationary)" if adf_result[1] < 0.05 else "(non-stationary)")

# Difference if needed
var_data_diff = var_data.diff().dropna()
print(f"\nAfter differencing:")
for col in var_data_diff.columns:
    adf_result = adfuller(var_data_diff[col])
    print(f"{col}: ADF p-value = {adf_result[1]:.4f}")

# Split (use same indices as deep learning split)
var_train = var_data_diff.iloc[:train_size-1]
var_test = var_data_diff.iloc[train_size-1:]

# Fit VAR (warning suppressed by proper datetime index)
var_model = VAR(var_train)
var_results = var_model.fit(maxlags=5, ic='aic')

print(f"\nVAR Model Summary:")
print(f"Selected lag order: {var_results.k_ar}")
print(var_results.summary())

# Forecast
var_forecast = var_results.forecast(var_train.values[-var_results.k_ar:], steps=len(var_test))
var_forecast_df = pd.DataFrame(var_forecast, columns=var_data_diff.columns)

# Calculate RMSE on differenced data
var_rmse_ortg = np.sqrt(mean_squared_error(var_test['ORtg'], var_forecast_df['ORtg']))
var_rmse_pace = np.sqrt(mean_squared_error(var_test['Pace'], var_forecast_df['Pace']))
var_rmse_3par = np.sqrt(mean_squared_error(var_test['3PAr'], var_forecast_df['3PAr']))
var_rmse_avg = np.mean([var_rmse_ortg, var_rmse_pace, var_rmse_3par])

print(f"\nVAR Test Performance (on differenced data):")
print(f"  ORtg RMSE:  {var_rmse_ortg:.4f}")
print(f"  Pace RMSE:  {var_rmse_pace:.4f}")
print(f"  3PAr RMSE:  {var_rmse_3par:.4f}")
print(f"  Average:    {var_rmse_avg:.4f}")
ORtg: ADF p-value = 0.8261 (non-stationary)
Pace: ADF p-value = 0.5595 (non-stationary)
3PAr: ADF p-value = 0.9863 (non-stationary)

After differencing:
ORtg: ADF p-value = 0.0000
Pace: ADF p-value = 0.0000
3PAr: ADF p-value = 0.0001

VAR Model Summary:
Selected lag order: 5
  Summary of Regression Results   
==================================
Model:                         VAR
Method:                        OLS
Date:           Thu, 04, Dec, 2025
Time:                     00:16:31
--------------------------------------------------------------------
No. of Equations:         3.00000    BIC:                   -5.43407
Nobs:                     25.0000    HQIC:                  -7.12523
Log likelihood:           38.7585    FPE:                0.000854264
AIC:                     -7.77431    Det(Omega_mle):     0.000193669
--------------------------------------------------------------------
Results for equation ORtg
==========================================================================
             coefficient       std. error           t-stat            prob
--------------------------------------------------------------------------
const           0.502311         0.432833            1.161           0.246
L1.ORtg        -0.435646         0.388728           -1.121           0.262
L1.Pace         0.070639         0.401564            0.176           0.860
L1.3PAr        16.120396        30.459202            0.529           0.597
L2.ORtg        -0.186130         0.394076           -0.472           0.637
L2.Pace         0.249215         0.380880            0.654           0.513
L2.3PAr       -15.222357        28.260078           -0.539           0.590
L3.ORtg         0.263800         0.386543            0.682           0.495
L3.Pace        -0.224747         0.399420           -0.563           0.574
L3.3PAr       -17.665773        26.719782           -0.661           0.509
L4.ORtg         0.301277         0.308052            0.978           0.328
L4.Pace        -0.115247         0.325821           -0.354           0.724
L4.3PAr       -32.866715        36.710124           -0.895           0.371
L5.ORtg         0.394904         0.272955            1.447           0.148
L5.Pace        -0.109445         0.262739           -0.417           0.677
L5.3PAr       -30.042530        44.022525           -0.682           0.495
==========================================================================

Results for equation Pace
==========================================================================
             coefficient       std. error           t-stat            prob
--------------------------------------------------------------------------
const          -0.475140         0.413401           -1.149           0.250
L1.ORtg         0.368110         0.371277            0.991           0.321
L1.Pace        -0.390264         0.383536           -1.018           0.309
L1.3PAr         3.422330        29.091784            0.118           0.906
L2.ORtg         0.324419         0.376385            0.862           0.389
L2.Pace         0.342327         0.363781            0.941           0.347
L2.3PAr       -37.953971        26.991386           -1.406           0.160
L3.ORtg        -0.203901         0.369190           -0.552           0.581
L3.Pace         0.470944         0.381489            1.234           0.217
L3.3PAr         3.068755        25.520240            0.120           0.904
L4.ORtg        -0.554529         0.294222           -1.885           0.059
L4.Pace         0.384931         0.311194            1.237           0.216
L4.3PAr        61.121316        35.062080            1.743           0.081
L5.ORtg         0.029470         0.260701            0.113           0.910
L5.Pace         0.285673         0.250944            1.138           0.255
L5.3PAr        46.204104        42.046203            1.099           0.272
==========================================================================

Results for equation 3PAr
==========================================================================
             coefficient       std. error           t-stat            prob
--------------------------------------------------------------------------
const           0.010575         0.004758            2.223           0.026
L1.ORtg        -0.011155         0.004273           -2.610           0.009
L1.Pace        -0.001207         0.004415           -0.273           0.785
L1.3PAr         0.474387         0.334854            1.417           0.157
L2.ORtg         0.005163         0.004332            1.192           0.233
L2.Pace        -0.001954         0.004187           -0.467           0.641
L2.3PAr        -0.184139         0.310678           -0.593           0.553
L3.ORtg         0.005346         0.004249            1.258           0.208
L3.Pace        -0.006662         0.004391           -1.517           0.129
L3.3PAr        -1.009615         0.293744           -3.437           0.001
L4.ORtg         0.008897         0.003387            2.627           0.009
L4.Pace        -0.008412         0.003582           -2.349           0.019
L4.3PAr        -0.272341         0.403573           -0.675           0.500
L5.ORtg         0.003906         0.003001            1.302           0.193
L5.Pace        -0.007597         0.002888           -2.630           0.009
L5.3PAr        -0.929284         0.483962           -1.920           0.055
==========================================================================

Correlation matrix of residuals
            ORtg      Pace      3PAr
ORtg    1.000000  0.332616  0.431364
Pace    0.332616  1.000000 -0.336023
3PAr    0.431364 -0.336023  1.000000




VAR Test Performance (on differenced data):
  ORtg RMSE:  1.5486
  Pace RMSE:  1.5025
  3PAr RMSE:  0.0145
  Average:    1.0219

Comprehensive Model Comparison

Code
# Create comprehensive comparison table
final_comparison = pd.DataFrame({
    'Model Type': [
        'Traditional', 'Traditional', 'Traditional',
        'Deep Learning', 'Deep Learning', 'Deep Learning',
        'Deep Learning', 'Deep Learning', 'Deep Learning'
    ],
    'Model': [
        'ARIMA', 'SARIMAX', 'VAR',
        'RNN', 'GRU', 'LSTM',
        'RNN', 'GRU', 'LSTM'
    ],
    'Input Type': [
        'Univariate', 'Multivariate', 'Multivariate',
        'Univariate', 'Univariate', 'Univariate',
        'Multivariate', 'Multivariate', 'Multivariate'
    ],
    'RMSE': [
        3.575,  # ARIMA test RMSE from uniTS_model
        1.400,  # ARIMAX test RMSE from multiTS_model (Shot Selection model)
        var_rmse_avg,
        rnn_test_rmse,
        gru_test_rmse,
        lstm_test_rmse,
        mv_rnn_rmse_avg,
        mv_gru_rmse_avg,
        mv_lstm_rmse_avg
    ]
})

print("\n Comprehensive Model Comparison\n")
# Style the comprehensive comparison table similar to kable()
styled_final = final_comparison.style.format({
    'RMSE': '{:.4f}'
}).set_properties(**{
    'text-align': 'center'
}).set_table_styles([
    {'selector': 'th', 'props': [('text-align', 'center'), ('font-weight', 'bold'), ('background-color', '#f0f0f0')]},
    {'selector': 'td', 'props': [('text-align', 'center')]},
    {'selector': 'tbody tr:nth-child(even)', 'props': [('background-color', '#f9f9f9')]}
])
display(styled_final)

# Visualize
fig, ax = plt.subplots(figsize=(14, 6))

# Group by input type
univariate = final_comparison[final_comparison['Input Type'] == 'Univariate']
multivariate = final_comparison[final_comparison['Input Type'] == 'Multivariate']

x_uni = np.arange(len(univariate))
x_multi = np.arange(len(multivariate)) + len(univariate) + 0.5

bars1 = ax.bar(x_uni, univariate['RMSE'], width=0.6, label='Univariate', alpha=0.8, color='steelblue')
bars2 = ax.bar(x_multi, multivariate['RMSE'], width=0.6, label='Multivariate', alpha=0.8, color='coral')

ax.set_xlabel('Model', fontsize=12)
ax.set_ylabel('RMSE', fontsize=12)
ax.set_title('Comprehensive Forecasting Performance Comparison', fontsize=14, fontweight='bold')
ax.set_xticks(np.concatenate([x_uni, x_multi]))
ax.set_xticklabels(list(univariate['Model']) + list(multivariate['Model']), rotation=45, ha='right')
ax.legend(fontsize=11)
ax.grid(True, alpha=0.3, axis='y')
ax.axvline(x=len(univariate) - 0.25, color='gray', linestyle='--', alpha=0.5)

plt.tight_layout()
plt.show()

 Comprehensive Model Comparison
  Model Type Model Input Type RMSE
0 Traditional ARIMA Univariate 3.5750
1 Traditional SARIMAX Multivariate 1.4000
2 Traditional VAR Multivariate 1.0219
3 Deep Learning RNN Univariate 7.7987
4 Deep Learning GRU Univariate 5.2823
5 Deep Learning LSTM Univariate 6.3512
6 Deep Learning RNN Multivariate 4.9774
7 Deep Learning GRU Multivariate 3.0772
8 Deep Learning LSTM Multivariate 3.3675


Model Comparison

The comprehensive comparison reveals that deep learning models (RNN, GRU, LSTM) achieve competitive but not superior performance compared to traditional ARIMA/VAR methods, challenging the assumption that sophisticated neural architectures automatically improve predictions. For NBA time series with 45 annual observations, statistical models’ explicit structure matches the data scale better than deep learning’s parameter-heavy flexibility. Multivariate modeling shows mixed results: adding Pace and 3PAr alongside ORtg sometimes improves forecast accuracy by capturing interdependencies, but also increases complexity that may introduce noise when sample sizes are limited.

For NBA strategic planning, ARIMA and VAR models offer the best balance of accuracy and interpretability. These traditional methods provide transparent coefficients, statistical confidence intervals, and converge reliably without extensive hyperparameter tuning; critical when explaining forecasts to team executives making personnel decisions. Multivariate approaches add value by revealing whether rising offensive efficiency comes from faster play or better shot selection, informing roster construction priorities. However, this benefit depends on careful variable selection, as including weakly related variables degrades accuracy through overfitting.

Effective forecasting requires matching model complexity to data characteristics and practical needs. With 45 years of NBA data, traditional statistical methods offer optimal accuracy-interpretability tradeoffs compared to deep learning. Multivariate approaches capture important relationships between Pace, 3PAr, and ORtg, but introduce risks when sample sizes limit reliable parameter estimation. Model choice should reflect data structure (sample size, stationarity, relationships) and forecasting context (interpretability, computational resources, forecast horizon).


Bridging to Interrupted Time Series Analysis

The models above—from ARIMA to LSTM—share a common limitation: they assume the data-generating process remains stable across the entire observation period. ARIMA coefficients stay constant whether we’re in 1980 or 2025. LSTM hidden states evolve, but the network architecture treats 2012 and 2020 identically. Even VAR models, which capture dynamic relationships, presume those relationships don’t fundamentally shift.

Yet this project’s narrative rests on a critical claim: the NBA experienced discrete structural breaks that standard time series models cannot adequately capture. The 2012 analytics inflection wasn’t a smooth acceleration in 3PAr; it was a regime change where front offices simultaneously adopted data-driven strategy. The COVID-19 pandemic didn’t slightly perturb attendance; it erased 90% of it overnight then gradually recovered. These are intervention effects—abrupt, external shocks that create before-and-after dynamics traditional models misspecify as smooth trends.

The next chapter shifts from forecasting to causal inference using interrupted time series (ITS) analysis. Rather than asking “What happens next?”, ITS asks “What changed when the intervention occurred?” This framework isolates treatment effects from underlying trends by modeling level shifts (immediate impact) and slope changes (sustained trajectory alterations) around precisely dated events.

What you’ll see next:

  • COVID-19 intervention (March 2020) quantified through Google Trends data on “sports betting” searches, measuring both the immediate collapse when sports stopped and the gradual recovery as leagues resumed
  • New York legalization (January 2022) analyzed to isolate how the largest U.S. betting market’s launch affected national search interest and industry engagement
  • Counterfactual trajectories showing what would have happened without these interventions, with the gap between actual and counterfactual quantifying causal effects
  • β₂ and β₃ coefficients interpreted as policy-relevant parameters: level change measures shock magnitude, slope change captures sustained impact

The transition from forecasting to causal inference reflects a shift in analytical intent. ARIMA/LSTM ask: “Given history, what’s likely next?” ITS asks: “Did this specific event cause that specific outcome?” Both use time series data, but with opposite epistemological goals: prediction versus explanation.

Why this matters for the overall narrative: Our story claims the analytics revolution and COVID pandemic transformed basketball and betting. But correlation isn’t causation. Did analytics cause efficiency gains, or did something else drive both? Did COVID’s attendance collapse cause betting interest to plummet, or were they independent shocks? ITS provides the statistical framework to separate coincidence from causation.

Where previous chapters modeled endogenous dynamics (how variables predict themselves and each other), the next chapter models exogenous shocks (how external events disrupt equilibrium). COVID didn’t emerge from past NBA trends; it was imposed from outside. New York’s legalization wasn’t forecast by prior betting data; it was a policy decision. ITS treats these as natural experiments, using regression discontinuity logic to isolate treatment effects.

The methodological progression mirrors the project’s conceptual arc: 1. Univariate models (isolated trends) 2. Multivariate models (interdependencies) 3. Financial models (volatility dynamics) 4. Deep learning (nonlinear patterns) 5. Interrupted time series (causal effects of discrete events)

Each layer adds complexity, but the final layer adds something more fundamental: identification strategy. We’re no longer content to forecast; we’re testing whether specific interventions had measurable, statistically significant effects. This is the bridge from descriptive analytics to evaluative policy analysis.